Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unumpy.average: init #265

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

doronbehar
Copy link

@doronbehar doronbehar commented Oct 21, 2024

  • Addresses (already closed) combined uncertainty from multiple independent measurements? #38
  • Executed ruff check with no errors related to my changes, and ran ruff format on the changed files.
  • The change is fully covered by automated unit tests (Not sure how to do that without simply duplicating the code of the new unumpy.average function.
  • Documented in docs/ as appropriate
  • Added an entry to the CHANGES file

Comment on lines 65 to 75
new_shape = []
# To hold the product of the dimensions to flatten
flatten_size = 1
for i in range(len(arr.shape)):
if i in axes:
flatten_size *= arr.shape[i] # Multiply dimensions to flatten
else:
new_shape.append(arr.shape[i]) # Keep the dimension
# This way the shapes to average over are flattend, in the end.
new_shape.append(flatten_size)
return arr.reshape(*new_shape)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, I'm not sure this does the correct thing when the original shape is something like (4,4,4,4,4) and axes is something like (1,3). Suggestions are more then welcome.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, I'm not sure this does the correct thing when the original shape is something like (4,4,4,4,4) and axes is something like (1,3). Suggestions are more then welcome.

I fixed that using numpy.apply_along_axis for every axis recursively.

Comment on lines +95 to +103
for axis in sorted(axes, reverse=True):
arr = numpy.apply_over_axis(_average, axis, arr)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm on a 2nd thought, perhaps this means that e.g for A with shape=(3,4,5,7), and axes=(1,3), if there are correlations between A[:,0,:,0] and A[:,1,:,1] are not taken into account? Because once axis=1 is averaged, the covariance_matrix call for the 1d slices of axis 1 won't take into account the correlations to the values already averaged on axis 3...

I think we can live with that, but perhaps warn the users about it in the function's doc, or elsewhere. Unless of course someone here will think of a better way to implement this.

@jagerber48
Copy link
Contributor

It sounds like you are writing code to extract the standard error on the mean of a sequence of UFloats. Or maybe more generally the maximum likelihood estimator of the mean in case the individual observations are correlated? Is that correct? If so this does sound useful to me.

Maybe consider first writing code that does this and then consider extending to taking means along slices of arrays of UFloat's after that? Perhaps the former function can somehow naturally be broadcast or vectorized into the latter.

@doronbehar
Copy link
Author

Hello and thanks for joining the discussion!

It sounds like you are writing code to extract the standard error on the mean of a sequence of UFloats. Or maybe more generally the maximum likelihood estimator of the mean in case the individual observations are correlated? Is that correct? If so this does sound useful to me.

Correct.

Maybe consider first writing code that does this and then consider extending to taking means along slices of arrays of UFloat's after that? Perhaps the former function can somehow naturally be broadcast or vectorized into the latter.

That's what I did eventually in the last attempt, but I'm pretty sure that correlations between values in different axes are not taken into consideration in that case (see my comment above). I reworded the function's __doc__ and added an implementation NOTE comment for maintainers.

@jagerber48
Copy link
Contributor

I would be more open to reviewing this idea if it did not interface with numpy. That is, if you provided a function that takes a collection of UFloat and returns a new UFloat calculated according to inverse variance weighting but which does not rely on numpy.

See #253. I would say that "how uncertainties handles numpy interoperability" is up-in-the-air at the moment. I think it would be pretty easy to get this wrong. Especially if you've already been facing challenges getting things to work along the right axes, or to work according to broadcasting etc.

Also see #282 and the discussion in #277 since the discussion there may impact the fate of this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants